Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets

نویسندگان

  • David Ruau
  • Michael Mbagwu
  • Joel Dudley
  • Vijay Krishnan
  • Atul J. Butte
چکیده

Publicly available molecular datasets can be used for independent verification or investigative repurposing, but depends on the presence, consistency and quality of descriptive annotations. Annotation and indexing of molecular datasets using well-defined controlled vocabularies or ontologies enables accurate and systematic data discovery, yet the majority of molecular datasets available through public data repositories lack such annotations. A number of automated annotation methods have been developed; however few systematic evaluations of the quality of annotations supplied by application of these methods have been performed using annotations from standing public data repositories. Here, we compared manually-assigned Medical Subject Heading (MeSH) annotations associated with experiments by data submitters in the PRoteomics IDEntification (PRIDE) proteomics data repository to automated MeSH annotations derived through the National Center for Biomedical Ontology Annotator and National Library of Medicine MetaMap programs. These programs were applied to free-text annotations for experiments in PRIDE. As many submitted datasets were referenced in publications, we used the manually curated MeSH annotations of those linked publications in MEDLINE as "gold standard". Annotator and MetaMap exhibited recall performance 3-fold greater than that of the manual annotations. We connected PRIDE experiments in a network topology according to shared MeSH annotations and found 373 distinct clusters, many of which were found to be biologically coherent by network analysis. The results of this study suggest that both Annotator and MetaMap are capable of annotating public molecular datasets with a quality comparable, and often exceeding, that of the actual data submitters, highlighting a continuous need to improve and apply automated methods to molecular datasets in public data repositories to maximize their value and utility.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

3D Scene and Object Classification Based on Information Complexity of Depth Data

In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...

متن کامل

A Novel Multicast Tree Construction Algorithm for Multi-Radio Multi-Channel Wireless Mesh Networks

Many appealing multicast services such as on-demand TV, teleconference, online games and etc. can benefit from high available bandwidth in multi-radio multi-channel wireless mesh networks. When multiple simultaneous transmissions use a similar channel to transmit data packets, network performance degrades to a large extant. Designing a good multicast tree to route data packets could enhance the...

متن کامل

Improved Univariate Microaggregation for Integer Values

Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...

متن کامل

Interference-Aware and Cluster Based Multicast Routing in Multi-Radio Multi-Channel Wireless Mesh Networks

Multicast routing is one of the most important services in Multi Radio Multi Channel (MRMC) Wireless Mesh Networks (WMN). Multicast routing performance in WMNs could be improved by choosing the best routes and the routes that have minimum interference to reach multicast receivers. In this paper we want to address the multicast routing problem for a given channel assignment in WMNs. The channels...

متن کامل

Comparison between conventional PCR and PCR - ELISA for detection of Brucella melitensis

Molecular detection techniques are believed to be key tools for both prevention and treatment follow up of brucellosis within live stock and human beings. Consequently rapid, reliable, easy to perform and automated systems for Brucella detection are urgently needed to allow early diagnosis and adequate antibiotic therapy in time. Brucellosis is a worldwide re-emerging zoonosis causing high econ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of biomedical informatics

دوره 44 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2011